Learning to Align Sequences: A Maximum-Margin Approach

نویسندگان

  • Thorsten Joachims
  • Tamara Galor
  • Ron Elber
چکیده

We propose a discriminative method for learning the parameters of linear sequence alignment models from training examples. Compared to conventional generative approaches, the discriminative method is straightforward to use when operations (e.g. substitutions, deletions, insertions) and sequence elements are described by vectors of attributes. This admits learning flexible and more complex alignment models. While the resulting training problem leads to an optimization problem with an exponential number of constraints, we present a simple algorithm that finds an arbitrarily close approximation after considering only a subset of the constraints that is linear in the number of training examples and polynomial in the length of the sequences. We also evaluate empirically that the method effectively learns good parameter values while being computationally feasible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PALMA: Perfect Alignments using Large Margin Algorithms

Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a conve...

متن کامل

A Systemic Approach to Improving Teaching and Learning

This paper describes one university’s approach to improving the quality of teaching and learning at the institutional level, based on the premise of improving the design of curriculum rather than focusing on the skills of teachers as such. The paper describes the process by which university-wide principles of curriculum design were defined and agreed, as well as the parallel campaigns needed to...

متن کامل

A Systemic Approach to Improving Teaching and Learning

This paper describes one university’s approach to improving the quality of teaching and learning at the institutional level, based on the premise of improving the design of curriculum rather than focusing on the skills of teachers as such. The paper describes the process by which university-wide principles of curriculum design were defined and agreed, as well as the parallel campaigns needed to...

متن کامل

Deep Transductive Semi-supervised Maximum Margin Clustering

Semi-supervised clustering is an very important topic in machine learning and computer vision. The key challenge of this problem is how to learn a metric, such that the instances sharing the same label are more likely close to each other on the embedded space. However, little attention has been paid to learn better representations when the data lie on non-linear manifold. Fortunately, deep lear...

متن کامل

Maximum Relative Margin and Data-Dependent Regularization

Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum mar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003